AITopics | approximation architecture

A new convergent variant of Q-learning with linear function approximation

Neural Information Processing SystemsAug-16-2025, 22:40:49 GMT

In this paper, we investigate the convergence of reinforcement learning with linear function approximation in control settings.

artificial intelligence, machine learning, reinforcement learning, (14 more...)

Neural Information Processing Systems

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)
Europe > Portugal > Lisbon > Lisbon (0.04)
North America > Canada (0.04)

Industry: Transportation (0.47)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Fuzzy Logic (0.64)

Add feedback

Non-parametric Approximate Dynamic Programming via the Kernel Method

Neural Information Processing SystemsMar-14-2024, 11:52:40 GMT

This paper presents a novel non-parametric approximate dynamic programming (ADP) algorithm that enjoys graceful approximation and sample complexity guarantees. In particular, we establish both theoretically and computationally that our proposal can serve as a viable alternative to state-of-the-art parametric ADP algorithms, freeing the designer from carefully specifying an approximation architecture. We accomplish this by developing a kernel-based mathematical program for ADP. Via a computational study on a controlled queueing network, we show that our procedure is competitive with parametric ADP approaches.

approximation, approximation architecture, rsalp, (13 more...)

Neural Information Processing Systems

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.14)
North America > United States > New York > New York County > New York City (0.04)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Unsupervised Basis Function Adaptation for Reinforcement Learning

Barker, Edward, Ras, Charl

arXiv.org Machine LearningMar-23-2017

When using reinforcement learning (RL) algorithms to evaluate a policy it is common, given a large state space, to introduce some form of approximation architecture for the value function (VF). The exact form of this architecture can have a significant effect on the accuracy of the VF estimate, however, and determining a suitable approximation architecture can often be a highly complex task. Consequently there is a large amount of interest in the potential for allowing RL algorithms to adaptively generate (i.e. to learn) approximation architectures. We investigate a method of adapting approximation architectures which uses feedback regarding the frequency with which an agent has visited certain states to guide which areas of the state space to approximate with greater detail. We introduce an algorithm based upon this idea which adapts a state aggregation approximation architecture on-line. Assuming $S$ states, we demonstrate theoretically that - provided the following relatively non-restrictive assumptions are satisfied: (a) the number of cells $X$ in the state aggregation architecture is of order $\sqrt{S}\ln{S}\log_2{S}$ or greater, (b) the policy and transition function are close to deterministic, and (c) the prior for the transition function is uniformly distributed - our algorithm can guarantee, assuming we use an appropriate scoring function to measure VF error, error which is arbitrarily close to zero as $S$ becomes large. It is able to do this despite having only $O(X\log_2{S})$ space complexity (and negligible time complexity). We conclude by generating a set of empirical results which support the theoretical results.

artificial intelligence, machine learning, reinforcement learning, (14 more...)

arXiv.org Machine Learning

1703.0794

Genre: Research Report > New Finding (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Unsupervised Basis Function Adaptation for Reinforcement Learning

Barker, Edward W., Ras, Charl J.

arXiv.org Machine LearningMar-2-2017

When using reinforcement learning (RL) algorithms to evaluate a policy it is common, given a large state space, to introduce some form of approximation architecture for the value function (VF). The exact form of this architecture can have a significant effect on the accuracy of the VF estimate, however, and determining a suitable approximation architecture can often be a highly complex task. Consequently there is a large amount of interest in the potential for allowing RL algorithms to adaptively generate approximation architectures. We investigate a method of adapting approximation architectures which uses feedback regarding the frequency with which an agent has visited certain states to guide which areas of the state space to approximate with greater detail. This method is "unsupervised" in the sense that it makes no direct reference to reward or the VF estimate. We introduce an algorithm based upon this idea which adapts a state aggregation approximation architecture on-line. A common method of scoring a VF estimate is to weight the squared Bellman error of each state-action by the probability of that state-action occurring. Adopting this scoring method, and assuming $S$ states, we demonstrate theoretically that - provided (1) the number of cells $X$ in the state aggregation architecture is of order $\sqrt{S}\log_2{S}\ln{S}$ or greater, (2) the policy and transition function are close to deterministic, and (3) the prior for the transition function is uniformly distributed - our algorithm, used in conjunction with a suitable RL algorithm, can guarantee a score which is arbitrarily close to zero as $S$ becomes large. It is able to do this despite having only $O(X \log_2S)$ space complexity and negligible time complexity. The results take advantage of certain properties of the stationary distributions of Markov chains.

artificial intelligence, machine learning, reinforcement learning, (17 more...)

arXiv.org Machine Learning

1703.01026

Country: Oceania > Australia (0.14)

Genre: Research Report (0.40)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Non-parametric Approximate Dynamic Programming via the Kernel Method

Bhat, Nikhil, Farias, Vivek, Moallemi, Ciamac C.

Neural Information Processing SystemsDec-31-2012

This paper presents a novel non-parametric approximate dynamic programming (ADP) algorithm that enjoys graceful, dimension-independent approximation and sample complexity guarantees. In particular, we establish both theoretically and computationally that our proposal can serve as a viable alternative to state-of-the-art parametric ADP algorithms, freeing the designer from carefully specifying an approximation architecture. We accomplish this by developing a kernel-based mathematical program for ADP. Via a computational study on a controlled queueing network, we show that our non-parametric procedure is competitive with parametric ADP approaches.

approximation, artificial intelligence, machine learning, (16 more...)

Neural Information Processing Systems

Country: North America > United States > Massachusetts > Middlesex County > Cambridge (0.14)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)

Add feedback

Actor-Critic Algorithms

Konda, Vijay R., Tsitsiklis, John N.

Neural Information Processing SystemsDec-31-2000

We propose and analyze a class of actor-critic algorithms for simulation-based optimization of a Markov decision process over a parameterized family of randomized stationary policies. These are two-time-scale algorithms in which the critic uses TD learning with a linear approximation architecture and the actor is updated in an approximate gradient direction based on information provided by the critic. We show that the features for the critic should span a subspace prescribed by the choice of parameterization of the actor. We conclude by discussing convergence properties and some open problems.

algorithm, approximation, value function, (14 more...)

Neural Information Processing Systems

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.15)
North America > United States > California > San Francisco County > San Francisco (0.14)
North America > United States > Massachusetts > Middlesex County > Belmont (0.04)
Asia > Middle East > Jordan (0.04)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.69)

Add feedback

Actor-Critic Algorithms

Konda, Vijay R., Tsitsiklis, John N.

Neural Information Processing SystemsDec-31-2000

We propose and analyze a class of actor-critic algorithms for simulation-based optimization of a Markov decision process over a parameterized family of randomized stationary policies. These are two-time-scale algorithms in which the critic uses TD learning with a linear approximation architecture and the actor is updated in an approximate gradient direction based on information provided by the critic. We show that the features for the critic should span a subspace prescribed by the choice of parameterization of the actor. We conclude by discussing convergence properties and some open problems.

algorithm, approximation, value function, (14 more...)

Neural Information Processing Systems

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.15)
North America > United States > California > San Francisco County > San Francisco (0.14)
North America > United States > Massachusetts > Middlesex County > Belmont (0.04)
Asia > Middle East > Jordan (0.04)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.69)

Add feedback

Actor-Critic Algorithms

Konda, Vijay R., Tsitsiklis, John N.

Neural Information Processing SystemsDec-31-2000

We propose and analyze a class of actor-critic algorithms for simulation-based optimization of a Markov decision process over a parameterized family of randomized stationary policies. These are two-time-scale algorithms in which the critic uses TD learning with a linear approximation architecture and the actor is updated in an approximate gradient direction based on information provided bythe critic. We show that the features for the critic should span a subspace prescribed by the choice of parameterization of the actor. We conclude by discussing convergence properties and some open problems.

Add feedback

Stable LInear Approximations to Dynamic Programming for Stochastic Control Problems with Local Transitions

Roy, Benjamin Van, Tsitsiklis, John N.

Neural Information Processing SystemsDec-31-1996

Recently, however, there have been some successful applications of neural networks in a totally different context - that of sequential decision making under uncertainty (stochastic control). Stochastic control problems have been studied extensively in the operations research and control theory literature for a long time, using the methodology of dynamic programming [Bertsekas, 1995]. In dynamic programming, the most important object is the cost-to-go (or value) junction, which evaluates the expected future 1046 B. V. ROY, 1. N. TSITSIKLIS

algorithm, compact representation, vector, (11 more...)

Neural Information Processing Systems

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.14)
Africa > Togo (0.05)
North America > United States > Massachusetts > Middlesex County > Belmont (0.04)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.86)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.67)

Add feedback

Stable LInear Approximations to Dynamic Programming for Stochastic Control Problems with Local Transitions

Roy, Benjamin Van, Tsitsiklis, John N.

Neural Information Processing SystemsDec-31-1996

Recently, however, there have been some successful applications of neural networks in a totally different context - that of sequential decision making under uncertainty (stochastic control). Stochastic control problems have been studied extensively in the operations research and control theory literature for a long time, using the methodology of dynamic programming [Bertsekas, 1995]. In dynamic programming, the most important object is the cost-to-go (or value) junction, which evaluates the expected future 1046 B. V. ROY, 1. N. TSITSIKLIS

algorithm, compact representation, vector, (11 more...)

Neural Information Processing Systems

Country: